Domain‐independent automatic keyphrase indexing with small training sets
Identifieur interne : 000A14 ( Main/Exploration ); précédent : 000A13; suivant : 000A15Domain‐independent automatic keyphrase indexing with small training sets
Auteurs : Olena Medelyan [Nouvelle-Zélande] ; Ian H. Witten [Nouvelle-Zélande]Source :
- Journal of the American Society for Information Science and Technology [ 1532-2882 ] ; 2008-05.
Descripteurs français
- Pascal (Inist)
English descriptors
- KwdEn :
Abstract
Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain‐specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.
Url:
DOI: 10.1002/asi.20790
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000669
- to stream Istex, to step Curation: 000635
- to stream Istex, to step Checkpoint: 000459
- to stream Main, to step Merge: 000A14
- to stream PascalFrancis, to step Corpus: 000030
- to stream PascalFrancis, to step Corpus: 000033
- to stream PascalFrancis, to step Curation: 000068
- to stream PascalFrancis, to step Checkpoint: 000028
- to stream Main, to step Merge: 000A47
- to stream Main, to step Curation: 000A14
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Domain‐independent automatic keyphrase indexing with small training sets</title>
<author><name sortKey="Medelyan, Olena" sort="Medelyan, Olena" uniqKey="Medelyan O" first="Olena" last="Medelyan">Olena Medelyan</name>
</author>
<author><name sortKey="Witten, Ian H" sort="Witten, Ian H" uniqKey="Witten I" first="Ian H." last="Witten">Ian H. Witten</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:B7ADEE369E44E8C2FD9532F4554A0A0F18548E84</idno>
<date when="2008" year="2008">2008</date>
<idno type="doi">10.1002/asi.20790</idno>
<idno type="url">https://api.istex.fr/document/B7ADEE369E44E8C2FD9532F4554A0A0F18548E84/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000669</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">000669</idno>
<idno type="wicri:Area/Istex/Curation">000635</idno>
<idno type="wicri:Area/Istex/Checkpoint">000459</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000459</idno>
<idno type="wicri:doubleKey">1532-2882:2008:Medelyan O:domain:independent:automatic</idno>
<idno type="wicri:Area/Main/Merge">000A14</idno>
<idno type="wicri:source">INIST</idno>
<idno type="RBID">Pascal:09-0056665</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000030</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000033</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000068</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000028</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000028</idno>
<idno type="wicri:doubleKey">1532-2882:2008:Medelyan O:domain:independent:automatic</idno>
<idno type="wicri:Area/Main/Merge">000A47</idno>
<idno type="wicri:Area/Main/Curation">000A14</idno>
<idno type="wicri:Area/Main/Exploration">000A14</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Domain‐independent automatic keyphrase indexing with small training sets</title>
<author><name sortKey="Medelyan, Olena" sort="Medelyan, Olena" uniqKey="Medelyan O" first="Olena" last="Medelyan">Olena Medelyan</name>
<affiliation wicri:level="1"><country xml:lang="fr">Nouvelle-Zélande</country>
<wicri:regionArea>Department of Computer Science, University of Waikato, Private Bag 3105, Hamilton 3240</wicri:regionArea>
<wicri:noRegion>Hamilton 3240</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Nouvelle-Zélande</country>
</affiliation>
</author>
<author><name sortKey="Witten, Ian H" sort="Witten, Ian H" uniqKey="Witten I" first="Ian H." last="Witten">Ian H. Witten</name>
<affiliation wicri:level="1"><country xml:lang="fr">Nouvelle-Zélande</country>
<wicri:regionArea>Department of Computer Science, University of Waikato, Private Bag 3105, Hamilton 3240</wicri:regionArea>
<wicri:noRegion>Hamilton 3240</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Nouvelle-Zélande</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Journal of the American Society for Information Science and Technology</title>
<title level="j" type="abbrev">J. Am. Soc. Inf. Sci.</title>
<idno type="ISSN">1532-2882</idno>
<idno type="eISSN">1532-2890</idno>
<imprint><publisher>Wiley Subscription Services, Inc., A Wiley Company</publisher>
<pubPlace>Hoboken</pubPlace>
<date type="published" when="2008-05">2008-05</date>
<biblScope unit="volume">59</biblScope>
<biblScope unit="issue">7</biblScope>
<biblScope unit="page" from="1026">1026</biblScope>
<biblScope unit="page" to="1040">1040</biblScope>
</imprint>
<idno type="ISSN">1532-2882</idno>
</series>
<idno type="istex">B7ADEE369E44E8C2FD9532F4554A0A0F18548E84</idno>
<idno type="DOI">10.1002/asi.20790</idno>
<idno type="ArticleID">ASI20790</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">1532-2882</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Automatic indexing</term>
<term>Controlled vocabulary</term>
<term>Indexing</term>
<term>Information extraction</term>
<term>Information system</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Extraction information</term>
<term>Indexation</term>
<term>Indexation automatique</term>
<term>Système information</term>
<term>Vocabulaire contrôlé</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain‐specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.</div>
</front>
</TEI>
<affiliations><list><country><li>Nouvelle-Zélande</li>
</country>
</list>
<tree><country name="Nouvelle-Zélande"><noRegion><name sortKey="Medelyan, Olena" sort="Medelyan, Olena" uniqKey="Medelyan O" first="Olena" last="Medelyan">Olena Medelyan</name>
</noRegion>
<name sortKey="Medelyan, Olena" sort="Medelyan, Olena" uniqKey="Medelyan O" first="Olena" last="Medelyan">Olena Medelyan</name>
<name sortKey="Witten, Ian H" sort="Witten, Ian H" uniqKey="Witten I" first="Ian H." last="Witten">Ian H. Witten</name>
<name sortKey="Witten, Ian H" sort="Witten, Ian H" uniqKey="Witten I" first="Ian H." last="Witten">Ian H. Witten</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Agronomie/explor/SisAgriV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A14 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000A14 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Agronomie |area= SisAgriV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:B7ADEE369E44E8C2FD9532F4554A0A0F18548E84 |texte= Domain‐independent automatic keyphrase indexing with small training sets }}
This area was generated with Dilib version V0.6.28. |